Automatic voice onset time detection for unvoiced stops (/p/, /t/, /k/) with application to accent classification
نویسندگان
چکیده
Articulation characteristics of particular phonemes can provide cues to distinguish accents in spoken English. For example, as shown in Arslan and Hansen (1996, 1997), Voice Onset Time (VOT) can be used to classify mandarin, Turkish, German and American accented English. Our goal in this study is to develop an automatic system that classifies accents using VOT in unvoiced stops. VOT is an important temporal feature which is often overlooked in speech perception, speech recognition, as well as accent detection. Fixed length frame-based speech processing inherently ignores VOT. In this paper, a more effective VOT detection scheme using the non-linear energy tracking algorithm Teager Energy Operator (TEO), across a sub-frequency band partition for unvoiced stops (/p/, /t/ and /k/), is introduced. The proposed VOT detection algorithm also incorporates spectral differences in the Voice Onset Region (VOR) and the succeeding vowel of a given stop-vowel sequence to classify speakers having accents due to different ethnic origin. The spectral cues are enhanced using one of the four types of feature parameter extractions – Discrete Mellin Transform (DMT), Discrete Mellin Fourier Transform (DMFT) and DiscreteWavelet Transform using the lowest and the highest frequency resolutions (DWTlfr and DWThfr). A HiddenMarkovModel (HMM) classifier is employed with these extracted parameters and applied to the problem of accent classification. Three different language groups (American English, Chinese, and Indian) are used from the CU-Accent database. The VOT is detected with less than 10% error when compared to the manual detected VOT with a success rate of 79.90%, 87.32% and 47.73% for English, Chinese and Indian speakers (includes atypical cases for Indian case), respectively. It is noted that the DMT and DWTlfr features are good for parameterizing speech samples which exhibit substitution of succeeding vowel after the stop in accented speech. The successful accent classification rates of DMT and DWTlfr features are 66.13% and 71.67%, for /p/ and /t/ respectively, for pairwise accent detection. Alternatively, the DMFT feature works on all accent sensitive words considered, with a success rate of 70.63%. This study shows that effective VOT detection can be achieved using an integrated TEO processing with spectral difference analysis in the VOR that can be employed for accent classification. 2010 Elsevier B.V. All rights reserved.
منابع مشابه
Estimation of voice-onset time in continuous speech using temporal measures.
This paper proposes an automatic acoustic-phonetic method for estimating voice-onset time of stops. This method requires neither transcription of the utterance nor training of a classifier. It makes use of the plosion index for the automatic detection of burst onsets of stops. Having detected the burst onset, the onset of the voicing following the burst is detected using the epochal information...
متن کاملAcoustic properties of foreign accent: VOT variations in Moroccan-accented Italian
The present study investigates the temporal parameter of VOT from a cross-language perspective, as far as native Moroccan, native Italian and Moroccanaccented Italian are concerned. The comparative analysis carried out underlines a language effect on the VOT duration across the three language varieties. The statistical test points out VOT as one of the acoustic properties that characterize the ...
متن کاملAcoustic features for detection of phonemic aspiration in voiced plosives
Plosives in Indo-Aryan languages such as Hindi and Marathi display a 4-way contrast involving the two dimensions of voicing and aspiration. While many studies are available on the acoustics of aspiration in unvoiced stops due to their more universal presence in the world’s languages, voiced aspirated plosives have been less studied. Rather than the release duration cue of aspiration in unvoiced...
متن کاملDetection of Automatic the Vot Value for Voiced Stop Sounds in Modern Standard Arabic (msa)
Signal processing in current days is under studying. One of these studies focuses on speech processing. Speech signal have many important features. One of them is Voice Onset Time (VOT). This feature only appears in stop sounds. The human auditory system can utilize the VOT to differentiate between voiced and unvoiced stops like /p/ and /b/ in the English language. By VOT feature we can classif...
متن کاملSeparation of stop consonants
To extract speech from acoustic interference is a challenging problem. Previous systems based on auditory scene analysis principles deal with voiced speech, but cannot separate unvoiced speech. We propose a novel method to separate stop consonants, which contain significant unvoiced signals, based on their acoustic properties. The method employs onset as the major grouping cue; it first detects...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Speech Communication
دوره 52 شماره
صفحات -
تاریخ انتشار 2010